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ABSTRACT 

It is a recurrent issue in astronomical data analysis that ob- 
servations are unevenly sampled or incomplete maps with 
missing patches or intentionaly masked parts. In addition, 
many astrophysical emissions are non stationary processes 
over the sky. Hence spectral estimation using standard 
Fourier transforms is no longer reliable. Spectral matching 
ICA (SMICA) is a source separation method based on co- 
variance matching in Fourier space which is successfully used 
for the separation of diffuse astrophysical emissions in Cos- 
mic Microwave Background observations. We show here that 
wavelets, which are standard tools in processing non station- 
ary data, can profitably be used to extend SMICA. Among 
possible applications, it is shown that gaps in data are dealt 
with more conveniently and with better results using this 
extension, wSMICA, in place of the original SMICA. The 
performances of these two methods are compared on simu- 
lated CMB data sets, demonstrating the advantageous use 
of wavelets. 

Keywords : blind source separation, cosmic microwave 
background, wavelets, data analysis 

1. INTRODUCTION 

The detection of Cosmic Microwave Background (CMB) 
anisotropics on the sky has been over the past three decades 
subject of intense activity in the cosmology community. 

The CMB, discovered in 1965 by Penzias and Wilson, 
is a relic radiation emitted some 13 billion years ago, when 
the Universe was about 370.000 years old. Small fiuctuations 
of this emission, tracing the seeds of the primordial homo- 
geneities which gave rise to present large scale structures as 
galaxies and clusters of galaxies, have bee n o bserved by a 
number of experiments such as Archeops |16| . Boomerang 
[T7j . Maxima [TH| and WMAP ^ . 

The precise measurement of these fiuctuations is of ut- 
most importance for Cosmology. Their statistical properties 
(spatial power spectrum, Gaussianity) strongly depend upon 
the cosmological scenarios describing the properties and evo- 
lution of our Universe as a whole, and thus permit to con- 
strain these models as well as to measure the cosmological 
parameters describing the matter content, the geometry, and 
the evolution of our Universe |2n| . 

Accessing this information, however, requires disen- 
tangling in the data the contribution of several distinct 
astrophysical sources, all of which emit radiation in the 
frequency range used for CMB observations This 
problem of component separation, in the field of CMB 
studies, has thus been the object of many dedicated studies 
in the past. 

To first order, the total sky emission is modelled as a lin- 
ear mixture of a few independent processes. The observation 
of the sky with detector d is then a noisy linear mixture of 



Nc components 



(1) 



where sj is the emission template for the jih astrophysical 
process, herein referred to as a source or a component. The 
coefficients A^j refiect emission laws while accounts for 
noise. When Nd detectors provide independent observations, 
this equation can be put in vector-matrix form : 



x{e,(i>)^AS{e,(t)) + N{e,(p) 



(2) 



where X and A'^ are vectors of length Nd, S is & vector of 
length Nc, and A is the Nd x Nc mixing matrix. 

Given the observations of such a set of independent de- 
tectors, component separation consists in recovering esti- 
mates of the maps of the sources Sj{8, (j>). Explicit component 
separation has been investigated first in CMB applications 
by |22| . |21| . and |23| . In these applications, recovering com- 
ponent maps is the primary target, and all the parameters 
of the model (mixing matrix Adj, noise levels, statistics of 
the components, including the spatial power spectra) are as- 
sumed to be known and used as priors to invert the linear 
system. 

Recent research has addressed the case of an imperfectly 
known mixing matrix. It is then necessary, to estimate it (or 
at least some of its entries) directly from the data. For in- 
stance, Tegmark et al. assume power law emission spectra for 
all components except CMB and SZ, and fit spectral indices 
to the observations . More recently, blind source separa- 
tion or independent component analysis (ICA) methods have 
been implemented specifically for CMB studies. The work of 
12], further extended by ^ implements a blind source sepa- 
ration method exploiting the non-Gaussianity of the sources 
for their separation, which permits to recover the mixing 
matrix A and the maps of the sources. 

Delabrouille et al. [J propose an approach exploiting the 
spectral diversity of components, with the new point of view 
that spatial power spectra are actually the main unknown 
parameters of interest for CMB observations. The estimation 
of a set of parameters of the model, among which the spatial 
power spectra of the components, is made using a set of band- 
averaged spectral covariance matrices in Fourier space. 

While working in the Fourier domain has a number of 
advantages, it also has a number of drawbacks. When com- 
ponents or noise are strongly non-stationary, one may wish to 
avoid the averaging induced by Fourier transforms. In addi- 
tion, when dealing with real-life observations, quite often the 
coverage is incomplete for a reason or another. Either the in- 
strument observes only a fraction of the sky, or some regions 
of the sky have to be rejected due to localised strong astro- 
physical sources of contamination : compact radiosources or 
galaxies, strong emitting regions in the galactic plane. 



Blind component separation (and in particular estima- 
tion of the mixing matrix), as discussed by Cardoso can 
be achieved in several different ways. The first of these ex- 
ploits non-Gaussianity of all but possibly one components. 
However, this is not recommended for mixtures where one 
component is close to Gaussian and all observations suffer 
from additive Gaussian noise. The component separation 
method of Baccigalupi p] and Maino [1| is based on this 
method. The second, which exploits spectral diversity (or 
non-stationarity in Fourier domain), has the advantage that 
detector-dependent beams can be handled easily, since the 
convolution with a point spread function in direct space be- 
comes a simple product in Fourier space. SMICA is an exten- 
sion of this approach to noisy observations. Finally, the third 
method exploits non-stationarity in real space. It is adapted 
to situations where components are strongly non-stationary 
in real space. 

As an extension of these last two methods, it is natural 
to investigate the possible benefits of exploiting both non- 
stationarity and spectral diversity for blind component sep- 
aration using wavelets. Indeed wavelets are powerful tools 
in revealing the spectral content of non-stationary data. In 
what follows, we first recall in section|2|the fundamental prin- 
ciples of Spectral Matching ICA. Then, after a brief reminder 
of the d trous wavelet transform, we discuss in section^lthe 
extension of SMICA for component separation in wavelet 
space in order to deal with non-stationary data. Considering 
the problem of incomplete data as a model case of practical 
significance for the comparison of SMICA and its extension 
wSMICA, numerical experiments and results are reported in 
sectional. From these, conclusions are drawn in section 

2. SMICA 

This paragraph recalls the main hypotheses and equa- 
tions of the SMICA algorithm which we actually extended to 
deal with gapped data. For ease of presentation, we concen- 
trate on the ID case since the extension to two dimensional 
data is straightforward. Detailled descriptions and discus- 
sions of this method can be found in |S| [H] and results of 
previous applications to CMB analysis can be read in 00. 

2.1 Model and cost function 

Spectral matching ICA is a blind source separation 
technique that overcomes the inseparability of Gaussian 
sources using standard ICA methods by relying on their 
assumed spectral diversity : SMICA allows us to recover 
independent Gaussian colored sources from observed noisy 
mixtures provided their spectra are substantially not 
proportional 14 . 

Considering the linear instantaneous mixing model with 
additive noise defined by (O, with the assumption that noise 
and source processes are centered, stationary and indepen- 
dent, and denoting Rx{v), Rs{v) and Rn{v) the spectral 
covariances of X, S and A'' respectively, it follows from J^J 
that for any value of the reduced frequency v £ [—0.5, 0.5], 

Rx{u) = ARs{v)A'' + Rn{v) (3) 

when we further assume independence between source and 
noise processes. Clearly, independence also implies that 
Rs{i^) and Rn{v) are diagonal matrices. 

Given a batch of T regularly spaced experimental data 
samples Xt=\-,T and a set {uq^q^i^q} of Q different reduced 
frequencies chosen a priori , estimates Rxivq) of Rx{vq) of 
the spectral covariance at these frequencies can be computed 
easily in a number of ways. The basic idea of spectral match- 
ing is to fit the model covariances of equation 10 to these 



experimental covariances by minimizing, over all or a sub- 
set of the model parameters 6 — {Rs{vq)^ RN{yq), A} , the 
functional 

Q 

<l}{e) = Y.'^i'^{^x{i^q),ARs{iyq)A'^ + RN{uq)'j (4) 

9=1 

where 'D{.,.) is a measure of the divergence between 
two covariance matrices, and Oq are weights which 
depend on q. This adjustment results in estimates 
9 = {Rs{vq), RN{i^q), A} of the model parameters and 
hence enables us to achieve the desired source separation. 
It is worth highlighting that resorting to covariances highly 
reduces data dimension, which is of great interest to 
astrophysical applications where data sets tend to become 
very large. Moreover, it may be argued in the stationary 
Gaussian case that this reduction is without significant loss 
of information 

Although any reasonable set of weights aq and diver- 
gence ©(., .) can be used in Q to assess spectral mismatch, 
this will affect the statistical properties of the estimated 
model parameters — {Rs{vq), RN{t^q), A}. Deriving a 
mismatch criterion from higher statistical principles such as 
maximum likelihood should lead to better such estimates. 

In the SMICA method, the divergence V used is given 

by 

Vkl{Ri,R2) = i (Tr(i?i J?2 ') - logdet(iiii?2"') - m) (5) 

which actually derives from the Kullback-Leibler divergence 
between two centered Gaussian distributions with size m x m 
covariance matrices Ri and i?2. Moreover, assuming constant 
source 7?^ ^ and noise i?^ ^ power spectra, over frequency do- 
mains {Fq}q^[i^Q] , SMICA uses refined unbiased estimates 
Rx q of the mixture covariance matrices Rx,q defined by : 

where X is the discrete Fourier transform of X, 

V t=0 

the Fq are non-overlapping domains in [—1/2, 1/2], symmet- 
ric with respect to zero, with their positive parts centered on 
fq, and riq is the number of ^ that fall in Fq. It follows from 

this definition that the entries of R^ q are in fact all real. 
The statistical grounds and implications of these choices are 
explored in 012] where it is shown that SMICA can be de- 
rived asymptotically from the maximum likelihood principle 
in the particular case of stationary processes in the Whittle 
approximation. This latter approximation asserts that the 
Fourier coefficients X{^) of a stationary process X{t) are 
asymptotically Gaussian, uncorrelated, centered with spec- 
tral covariance equal to Rx {■§:)■ 

As a result, the model covariance ^ is finally rewritten 

as : 

and the derived spectral matching criterion is given by 
Q 

m = E ^^^^^ {Ri.r^^L^^ + ^N.q) (9) 

9=1 



to be minimized with respect to the new set of parameters 

The previous definitions are easily extended for the 
method to be applied to real images. The above Fq are nat- 
urally replaced by 2D domains in the frequency plane 
These are best chosen, based on available prior information 
relative to source spectra, to enhance spectral diversity |14| . 
Regarding our application to CMB analysis, the supposed 
spatial stationarity and isotropy of the sources strongly 
suggests taking rings centered on the null frequency which 
are finally simply described as ID frequency bands. 

An especially important limiting case, for simulation 
purposes, is when the mixing matrix is square and invertible, 
and when the mixtures are assumed without noise. Then, as 
shown in IH], the likelihood can be directly related to a joint 
diagonalization criterion of spectral covariance matrices 
for which an efficient optimization algorithm is actually 
available. 



2.2 Parameter optimization 

Finding the model closest to the data in the sense 
of SMICA's objective function benefits from the latter's 
connection to the maximum likelihood principle and indeed 
the EM algorithm is shown to be a fruitful search method in 
Q where it is fully described. Actually, this latter algorithm 
was slightly modified in order to deal with the case of 
colored noise N in Another useful enhancement was to 
allow for constraints to be set on the model parameters so 
that prior information such as bounds on some entries of 
the mixing matrix A could be included. The details of this 
constrained EM algorithm are given in appendix IaI 

Eventually, using the EM algorithm in simulation, it 
appeared that after a quick start, convergence slowed down 
dramatically in a second stage possibly owing to poor signal 
to noise ratio in some frequency bands. In order to speed 
convergence back up, it was found profitable to alternately 
use fixed numbers of EM steps and BFGS steps |1] 12; in a 
heuristic procedure. 

An unavoidable issue in optimization is that of initiating 
the search method and this, obviously, is most critical 
when the function to be optimized is strongly suspected to 
be multimodal. Such may very well be the case with 1^. 
This point though is left aside in what follows since our 
prime interest is in the study of the statistical performances 
of different estimators of the model parameters 9. In the 
simulations discussed further down, the optimal values of 
the parameters are sought starting from the true mixing 
matrix and the spectral covarinces estimated from the initial 
separate source and noise maps. 

2.3 Component map estimation 

As by-products of the SMICA method, estimates Rg ^ 

and ^ of the different signal and noise covariances are 
obtained in the model fitting step and can be used for re- 
constructing the source maps via Wiener filtering the data 
maps in Fourier space, in each frequency band v £ Fq : 

S{u) = {A^Ri,-^'A + Ri-Y'A^Rf,-^'Xiu) (10) 

In the limiting case where noise is small compared to signal 
components, Rg~q^ is negligible and the above filter reduces 
to 

S{u) = {A^Rf,'^'Ar'A^Ri,'^'X{u) (11) 



which is also the generalized least square solution under 
Gaussian statistics. 

Note however that the Wiener filter is only one possibility 
among others for inverting ||5J|. Its optimality is true in the 
restricted case of Gaussian noise and signal processes. In 
real case applications, other inverting schemes should also 
be experimented [J. 

3. WAVELETS AND SMICA 

The SMICA method for spectral matching in Fourier 
space has proven to be a very powerful tool for CMB spectral 
estimation in multidetector experiments. It is particularly 
useful to identify and remove residuals of poorly known cor- 
related systematics and astrophysical foreground emissions 
contaminating CMB maps. However, SMICA suffers from 
several practical difficulties when dealing with real data. 

Indeed, actual components are known to depart slightly 
from the ideal linear mixture of equation JJJ. The mixing 
matrix (in particular those columns of A which correspond 
to galactic emissions) is known to depend somewhat on the 
direction of observation or on spatial frequency. Measuring 
the dependence A(6, (p) is of interest for future experiments 
as Planck, and can not be achieved directly with SMICA. 
Further, the components are known to be both correlated 
and non stationary. For instance, galactic dust emissions are 
strongly peaked towards the galactic plane. A Fourier (or 
spherical harmonics) transform inevitably mixes contribu- 
tions from high galactic sky, nearly free of foreground con- 
tamination, and contributions from within the galactic plane. 
Noise levels themselves may be quite non stationary, with 
high SNR regions observed for a long time and low SNR 
regions poorly observed. 

When there are sharp edges on the maps or gaps in 
the data, corresponding to unobserved or masked regions, 
spectral estimation using the periodogram or the Daniell- 
like smoothed periodogram as in lO is also not the most 
satisfactory procedure. Although apodizing windows may 
help cope with edge effects in Fourier analysis, they are 
not very straightforward to use in the case of arbitrarily 
shaped 2D maps with arbitrarily shaped 2D gaps, such 
as provided by the Archeops experiment ^^1- Clearly, the 
spectral analysis of gapped data requires tools different 
from those used to process full data sets, if only because the 
hypothesized stationarity of the data is greatly disturbed 
by the missing samples. 

Common such methods often amount to first trying 
to fill the gaps with estimates of the missing samples and 
then using standard spectral estimators. However, the data 
interpolation stage is critical and cann ot be completed 
without prior assumptions on the data |12| . We prefered 
to rely on methods intrinsically dedicated to the analysis 
of non-stationnary data such as the wavelet transform, 
widely used to reveal variations in the spectral content 
of time series or images, as they permit to single out 
regions in direct space while retaining localization in the 
frequency domain. We see next how to reformulate Q so 
to take advantage of wavelet transforms when dealing with 
non-stationary data. A particular case in which wavelets are 
shown to be an especially powerful tool is that of incomplete 
data. Note that in what follows, the locations of the missing 
samples are always known. 



3.1 Wavelet transform : the a trous algorithm 

We give here the necessary background on the d 
trous algorithm which, among the several possible wavelet 
transform implementations, is the one we retained in our 
simulations. With the compact supported cubic B3 spline 



as scaling function <?i(fc), or its 2D quasi-isotropic extension 
4>{k)4>{l), the d trous algorithm has been shown to be well 
suited to the analysis of atrophysical data where translation 
invariance is desi rabl e and the accent is seldom set on 
data compression |lf)|. Fo r this choice of scaling function, 
the scaling equation IjlHIl is satisfied and therefore fast 
implementations of the decomposition an d r econstruction 
steps of the d trous tranform are available |lfl| . 

Consider for instance a sampled ID signal co(fc) of length 
T. The d trous algorithm recursively produces smoother ap- 
proximations Ci to Co on a dyadic resolution scale using a 
low-pass filter h according to : 

u u 

(12) 

where h = {1/16, 1/4,3/8, 1/4, 1/16} is actually the set of 
coefficients in the scaling equation for the cubic spline : 

(^(fc) = ^ /i(ii)0(2fc - ii) (13) 

u 

We note that each Ci is the same size as the original data co 
and that the lowest resolution Jmax is obviously limited by 
data size T. Then, taking the difference between two consec- 
utive approximations gives the details at that scale or the 
wavelet coefficients 

w.{k) = c,,,{k) - c^k) = Y ^i'('^)M^) (14) 

u 

where the wavelet function i){k) is defined by : 

The uii's and Ci's given using the d trous algorithm actually 
are obtained by passing the original signal cq through a set 
of finite impulse response (FIR) filters ipi,^2, ■ ■ ■ ,'4'j,4'J- An 
essential property of these filters is that an inverse transform 
exists. In fact, reconstruction results simply from adding all 
the wavelet scales together with the last smooth approxima- 
tion : 

VA:,co(fc) = cj{k) + wj(k) + wj-i(k) + . . . + ui2(/c) + wi(fc) 

(16) 

The above d trous algorithm is easily extendable to two- 
dimensional images : 

c,(/c,0 = YYh(u,v)c^-^(k + 2'-^u,l + 2'-\){n) 

U V 

w^{k,l) = c^-i{k,l)~c,{k,l) (18) 

and the reconstruction is still a simple co-addition of the 
wavelet scales and the smooth array : 

J 

co{k,l) ^ cj{k,l) +J2Mk,l) (19) 

The use of the B3 spline leads to a convolution with the 
5x5 mask h : 
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but it is faster to compute the convolution in a separable 
way (first on rows, and then on the resulting columns). 



3.2 Spectral matching in wavelet space : wSMICA 

Consider the set of ideal band pass filters Tq associated 
with non-overlapping frequency domains Fq as used by the 
Fourier space implementation of SMICA. Let Yq denote the 
stationary Gaussian random processes obtained by passing 
the observations X of size m through filter Tq. Let Yq be their 
Fourier coefficients. Because of the unitary property of the 
Fourier transform, considering a batch of T samples Xt=i,T, 
the following equality between joint probabilities holds : 

P(Yi.,t=l,T, Yg.t^i^T) = P{Yi,k = l,T, ?Q;fc=l,T) (20) 

Assuming uncorrelated Fourier coefficients as in the above 
mentioned maximum likelihood derivation of SMICA based 
on the the Whittle approximation, and because of the non- 
overlapping filters, it follows that the Yq-t for different g's are 
also decorrelated so that : 

Q 

-l0gP(yi;t=l.T, YQ.,t=l,T) = - ^ logP(F,;fc=i,T) (21) 

and that Vg : 

^10gP(F,;fe = l,T) = -10gP(Y,;fc^l,T) 

= nqVKL [%^q, ARi^^A^ + i?^,,) ^^^^ 

Now define mixture, source and noise covariances Rx^q, 
R's,q and R%^q in the time domain at the output of the above 
filters. The former matrices can be estimated from the avail- 
able data using : 

1 

Rx,q = y E ^"--'^It (23) 
t=o 

and nothing opposes attempting component separation by 
spectral matching in the time domain using these latter co- 
variances by minimizing 

Q 

m = E a,T^{Rx.q,ARlqA^ + (24) 

q = l 

with respect to 9 = {A, Rg ,j, R'j^^q), provided the estimated 
covariances are full rank matrices. However, deriving 
adequate weights aq in order to get a good approximation 
of the likelihood is not straightforward because of the 
correlations between the Yq-is at different t's. In fact, owing 
to these correlations, the convergence of Rx,q to Rx,q can 
be very slow. The helpful point equation H22|l actually makes 
is that taking — riq will correctly reffect our confidence 
in the estimated covariances R^x,q- 

The next step is obviously to use another set of filters in 
place of the ideal band pass filters used by SMICA. In fact, 
in dealing with non stationary data or, as a special case, 
with gapped data, it is especially attractive to consider 
finite impulse response filters. Indeed, provided the response 
of such a filter is short enough compared to data size T 
and gap widths, not all the samples in the filtered signal 
will be affected by the gaps. Therefore, using these latter 
samples exclusively, one may expect better estimation of 
the statistical properties of the original data i.e. with- 
out the gaps. We choose in what follows to use filters 
%p-i,il)2, ■ ■ ■ ,^j,4>j (see figure and the wavelet d trous 
algorithm described previously. An immediate consequence 
of this choice is that the decorrelation between the different 



filter outputs no longer holds, due to their overlapping 
responses in Fourier space. However, we do benefit from the 
fast filtering algorithms and, which is quite significant, from 
the possibility of reconstructing estimated source templates. 

Let us consider again a batch of T regularly spaced data 
samples Xt^i_,T. Possible gaps in the data are simply de- 
scribed with a mask /i i.e. a vector of zeroes and ones the 
same length as X with ones corresponding to samples out- 
side the gaps. Denoting Wi, W2, . . . , Wj and Cj the wavelet 
scales and smooth approximation of X, obtained with the d 
trous transform and fii, . . . , fij+i the masks for the different 
scales determined from the original mask fi{t) knowing the 
different filter lengths, wavelet covariances are estimated as 
follows : 

1 ^ 

(25) 

where k is the number of non zero samples in pi. With source 
and noise covariances Rs,ij Rn.z defined in a similar way, the 
covariance model in wavelet space becomes 

Rl,^ = AR^^^A^ + R^^^ (26) 

and minimizing 

Q 

m = „AJ?^_,At ^) (27) 

9 = 1 

with respect to the model parameters 9^ = {A, R^^^, R'^ ^) 
achieves the desired component separation. 

However, in order for (l>{6) to be a good approximation 
to the likelihood, the weights again have to be deter- 
mined with care. These weights should account for the cor- 
relations between wavelet coefficients from different or the 
same scales, especially in the lower frequencies. Actually, ex- 
agerating the so-called decorrelating property of the wavelet 
transform, we assume coefficients from different scales are 
uncorrelated. Nevertheless, coefficients from one same scale 
are strongly correlated, especially with the adopted d trous 
redundant transform. Then, in the case of complete data sets 
i.e. without gaps, and because the ID wavelet filter length 
in the time domain doubles from scale to scale, the transpo- 
sition of equation 12211 leads to taking : 

r 1 r 1 1 1 1 1 , ^ 

|ai,a2, ...,aj,aj+i} = {-, -, ^} (28) 
In the 2D case, this becomes : 

3 3 3 1 
{ai,a2,...,aj,a./+i} = {j, r^,-, jj, p-} (29) 

However, when there are gaps in the data, the Fourier modes 
can be strongly correlated and the Whittle approximation is 
no longer appropriate. In order to derive an approximate 
likelihood function, consider the orthogonal discrete wavelet 
transform. In the ID case, this is a non-redundant transform 
in which the number of coefficients is halved from scale to 
scale. It is common and quite convenient to assume these 
coefficients are uncorrelated. Denoting if the number of 
DWT coefficients unaffected by the gaps in scale i, these 
have the same statistical significance or information content 



as the li ^ 2' X if^"^ coefficients in scale i determined with 
the d trous wavelet transform. Finally, a good approximation 
to the likelihood is obtained taking 

r -, ,-^1 '2 Ij h+i-, 

|ai,Q2, ...,aj,aj+i\ = Ig- - gJ' ~2^J ^^^^ 

or, in the 2D case, : 

r , ,3Zi 3^2 3ij ij+i, , , 

{ai,OL2, ...,aj,aj+i\ = { — . (31) 

in equation H27|l . We will refer to this combination of 
principles from SMICA and wavelet transforms as wSMICA. 

A point to be stressed here is that the number of bands 
in the case of wSMICA is very much limited by the original 
data size, which is not as strongly the case with SMICA. 
But this limitation is m ostly a requirement for reconstruc- 
tion using llljl and Ijl6|l to make sense. If the mixing matrix 
^ is a parameter of greater interest and if there is no real 
need to estimate source maps S, then there is no objection 
in principle to using more redundant transforms such as the 
continuous wavelet transform, or in fact any set of linear fil- 
ters (of finite impulse response to cope easily with edges and 
gaps). This in turn rai ses t he question of optimally choosing 
this set of filters as in |12| . 
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Fig. 1 - Magnitudes of the cubic spline wavelet fil- 
ters Vii V'a, • • • , ^5 used in the simulations described 
further down. The vertical dotted lines for v — 
{0.013, 0.025, 0.045, 0.09, 0.2, 0.5} delimit the five frequency 
bands used with SMICA in these simulations. 



4. NUMERICAL EXPERIMENTS 

4.1 Simulated data 

The methods described above were applied to synthetic 
observations consisting of m = 6 mixtures of n = 3 com- 
ponents namely CMB, galactic dust and SZ emissions for 
which typical templates, shown on figure El were obtained 
as described in 

The templates, and thus the mixtures in each simulated 
data set, consist of 300 x 300 pixel maps corresponding 
to a 12.5° X 12.5° field located at high galactic latitude. 
The six mixtures in each set mimic observations that will 
eventually be acquired in the six frequency channels of the 
Planck-HFI on part-sky, local maps. The entries of the 
mixing matrix A used in these simulations actually are 
estimated values of the electromagnetic emission laws of the 




Fig. 2 - Simulated component templates for CMB (top), 
DUST (middle), SZ (bottom). 



original components at 100, 143, 217, 353, 545 and 857 GHz. 
These values are grouped in tabled 



7.452 X 10" 
5.799 X 10" 
3.206 X 10" 
7.435 X 10" 
6.009 X 10" 
6.115 X 10" 



3.654 X 10" 
7.021 X 10" 
1.449 X 10" 
3.106 X 10" 
5.398 X 10" 
7.648 X 10" 



-8.733 X 10" 
-4.689 X 10" 
-2.093 X 10" 
1.294 X 10" 
2.613 X 10"- 
5.268 X 10"' 



100 GHa 
143 GHz 
217 GHz 
353 GHz 
545 GHz 
857 GHz 



Tab. 1 - Entries of A, the mixing matrix used in our simu- 
lations. 



—6, —3, and +3 dB from nominal values. Table [2| gives the 
typical energy fractions that are contributed by each of the 
n = 3 original sources and noise, to the total energy of each 
of the m — & mixtures, considering Planck nominal noise 
variance. In fact, because SMICA and wSMICA actually 
work on spectral bands, a much better indication of signal 
to noise ratio in these simulations is given by figure 01 
where it is shown how noise and source energy contribu- 
tions distribute with respect to frequency in the six mixtures. 
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1.24 : 



10" 



100 GHz 
143 GHz 
217 GHz 
353 GHz 
545 GHz 
857 GHz 



Tab. 2 - Energy fraction contributed by each source to the 
total energy of each mixture, for the nominal noise variance 
on the Planck HFI channels. 






Fig. 3 - Energy contributed by each source and noise to 
each of the six mixtures (mixture 1 : top left, mixture 6 : 
bottom right) as a function of frequency, for the nominal 
noise variance on the Planck HFI channels. Note how SZ is 
expected to always be below nominal noise, that CMB and 
dust strongly dominate in different channels and that CMB 
and dust spectra, without being proportional, display the 
same general behaviour dominated by low modes. 



White Gaussian noise was added to the mixtures 
according to equation J^J in order to simulate instrumental 
noise. While the relative noise standard deviations between 
channels were set according to the nominal values of the 
Planck HFI, we experimented five global noise levels at —20, 



Finally, in order to investigate the benefits of using 
wSMICA in place of SMICA when gaps are inserted in the 
data, the mask shown on figure 0[ was applied onto the 
mixture maps. The case where no data is missing was also 
considered for the sake of comparison. In each of these two 



particular configurations, spectral matching was assessed 
and optimized both at the output of the five wavelet filters 
ipi,...,tp5 associated to higher frequency details, and on 
the corresponding five bands in Fourier space, as shown on 
figure U This latter choice of frequency bands is simply 
made to ease comparison between SMICA and wSMICA. 
It may be argued that this choice is probably not optimal 
to run SMICA. But, in fact, the optimal selection of filters 
is clearly a meaningful question both for SMICA and 
wSMICA. This will require further investigation. 






Fig. 4 - Mask used to simulate a gap in the data (top left), 
and the modified masks at scales 1 (top right) through 5 
(bottom left). The discarded pixels are in black. 



• use the algorithm in [S] to jointly diagonalize the 
covariances in each configuration, and keep the 
resulting separating matrices. 

If the sources have satisfactory spectral properties, the 
obtained separating matrices should not depart drastically 
from the identity matrix. Moreover, denoting A- any invert- 
ible 3x3 mixing matrix, and the resulting separat- 

ing matrix, it is shown in ^1] that the variances of the off- 
diagonal terms in A~^A depend only on spectral diversity, 
in the case of Gaussian sources. In fact, to assess the ef- 
fect of any non-Gaussianity or non-stationarity in the source 
templates, the same experiment was repeated on Gaussian 
maps generated with the same spectra as the CMB, Dust 
and SZ components. In any case, the independent source 
components are separated using : 



S ^ A'^AS =IS 



(32) 



so that with the above normalization, the square of any 
off-diagonal term lij is directly related to the residual level 
of component j in the recovered component i. 
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4.2 Preliminary results 

Preliminary experiments were conducted in the case of 
vanishing instrumental noise variance, with a square 3x3 
mixing matrix. It was mentioned before that in this limit, 
the spectral matching objective boils down to the joint 
diagonalization of covariance matrices. Further, taking the 
mixing matrix to be the identity matrix {i.e. try to separate 
sources which are not actually mixed ), it is possible to gain 
some insight on the spectral diversity of the independent 
sources, for a given choice of bands or filters. Indeed, the 
performance of the independent component separation 
methods based on spectral matching depend highly on 
spectral diversity. 

The following steps were repeated 1000 times : 

• randomly pick one of each component maps out of the 
available 200 CMB maps, 30 dust maps and 1500 SZ 
maps. 

• calculate covariance matrices in the five wavelet or 
Fourier bands, both with and without masking part 
of the maps, as is all described above. 

• normalize each source so that its total energy over the 
five bands is equal to one. 



Fig. 5 - Histograms of the off diagonal term corresponding to 
the residual corruption of "CMB" by "Dust" while separat- 
ing Gaussian maps generated with the same power spectra as 
the astrophysical components, by joint diagonalization of co- 
variance matrices in Fourier (left) and wavelet (right) space, 
with (black, which appears grey when seen through white ) 
and without (white) masking part of the the data. The dark 
widest histogram on the left highlights the impact of masking 
on source separation based on Fourier covariances. 



The histograms on figure are for the off diagonal 
term corresponding to the residual corruption of CMB by 
Gaussian Dust in the second set of experiments. In tables 
OandUl the results obtained with the synthetic component 
maps are given as well as those obtained with the Gaussian 
maps, in terms of the s tan dard deviations of the off-diagonal 
entries lij defined by 13211 . 

Interestingly, when working on Gaussian maps without 
masks, using covariances in Fourier space or in wavelet 
space gives similar performances. It is also satisfactory, 
when covariances in wavelet space are used with Gaussian 
maps, that each computed standard deviation only slightly 
increases when a mask is applied on the data. Indeed, as a 
consequence of incomplete coverage, there are less samples 
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Tab. 3 - Standard deviations of the off-diagonal entries X,j 
defined by %\2\ obtained while separating realistic compo- 
nent maps by joint diagonalization of covariance matrices in 
Fourier space, with (M) or without masking (NM) part of 
the data, or applying an apodizing Hanning window {Han). 
Components 1, 2 and 3 respectively stand for CMB, Dust and 
SZ. The numbers in italic were obtained with Gaussian maps 
and the underlined numbers correspond to the histograms in 
figure 
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Tab. 4 - St and ard deviations of the off-diagonal entries 2ij 
defined by Ij32|l obtained while separating realistic compo- 
nent maps by joint diagonalization of covariance matrices 
in wavelet space, with (M) and without masking [NM) 
part of the data. Components 1, 2 and 3 respectively stand 
for CMB, Dust and SZ. The numbers in italic were obtained 
with Gaussian maps and the underlined numbers correspond 
to the histograms in figure]^ 



from which to estimate the covariances. This increase is also 
observed when covariances in Fourier space are used with 
the Gaussian maps but it can be as high as five-fold and 
it does not affect all coefficients the same way. Although 
this can again be attributed to the reduced data size, the 



lowered spectral diversity between components, because of 
the correlations and smoothing induced in Fourier space 
by the mask, is also part of the explanation. In fact, as 
shown on figure O CMB and dust spatial power spectra 
are somewhat similar, i.e. show low spectral diversity, and 
further smoothing can only degrade the performance of the 
source separation algorithm based on Fourier covariances. 

In the case of realistic component maps, we note first 
that the comparison of the performance of component 
separation using wavelet covariances with and without mask 
again agrees with the different data sizes, which is not 
the case with covariances in Fourier space. Next, whether 
covariances in Fourier or wavelet space are used, we note 
that the terms coupling CMB and Dust are again much 
higher in magnitude, even on complete maps. It seems 
that the actual non-stationarity and non-Gaussianity of the 
realistic component maps are relevant issues. Another point 
is that the CMB and Dust templates as in figure El exhibit 
sharp edges compared to SZ and this inevitably disturbs 
spectral estimation using a simple DFT. To assess this effect, 
simulations were also conducted where the covariances in 
Fourier space were computed after an apodizing Hanning 
window was applied on the complete data maps. The results 
reported in table El to be compared to table (31 do indicate a 
slightly positive effect of windowing, but still the separation 
using wavelet covariances appears better. 



4.3 Realistic experiments 

The above preliminary results clearly point out in the 
noiseless case the advantageous use of wavelets to easily 
escape the very bad impact that gaps and sharp edges 
actually have on the performance of the source separation 
using covariances in Fourier space. Hence this is strong 
encouragement to move on to investigating the effect of 
additive noise on the mixture maps according to JU, using 
SMICA and its extension wSMICA. We note that although 
in the case of wSMICA the link with maximum likelihood is 
not as strongly asserted as with SMICA, the optimization 
algorithm used in the simulations hereafter consists in both 
cases of the same heuristic succession of EM and B FGS 
steps and initialization is done as discussed in DaragraDh l2.2l 

Picking at random one of each component maps out 
of the available 200 CMB maps, 30 dust maps and 1500 
SZ maps, 1000 synthetic mixture maps were generated as 
previously described, for each of the 5 noise levels chosen. 
Then, component separation was conducted using the 
spectral matching algorithms SMICA and wSMICA both 
with and without part of the maps being masked. Now, 
each run of SMICA and wSMICA on the data returns 
estimates Af and Aw of the mixing matrix. Clearly, these 
estimates are subject to the indeterminacies inherent to the 
instantaneous linear mixture model Indeed, in the case 
where optimization is over all parameters 9, it is obvious 
that any simultaneous permutation of the columns of A 
and of the lines of S leaves the model unchanged. The 
same occurs when exchanging a scalar possibly negative 
factor between any column in A and the corresponding 
line in S. Therefore, columnwise comparison of A/ and An, 
to the original mixing matrix A requires first fixing these 
indeterminacies. This is done by hand after Af and A^ have 
been normalized columnwise. 

The results we report next concentrate on the statistical 
properties oi Af and A^ as estimated from the 1000 runs 
of the two competing methods in the several configurations 
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Fig. 6 - Comparison of the mean squared errors on the es- 
timation of the emissivity of CMB as a function of noise 
in five different configurations namely : wSMICA without 
mask, wSMICA with mask, fSMICA without mask, fSMICA 
with mask, fSMICA with Hanning apodizing window. 




noise level in dB relative 1o nominal values 



Fig. 8 - Comparison of the mean squared errors on the 
estimation of the emissivity of SZ as a function of noise 
in five different configurations namely : wSMICA without 
mask, wSMICA with mask, fSMICA without mask, fSMICA 
with mask, fSMICA with Hanning apodizing window. 
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Fig. 7 - Comparison of the mean squared errors on the es- 
timation of the emissivity of DUST as a function of noise 
in five different configurations namely : wSMICA without 
mask, wSMICA with mask, fSMICA without mask, fSMICA 
with mask, fSMICA with Hanning apodizing window. 



retained. In fact, the correct estimation of the mixing matrix 
in model Q is a relevant issue for instance when it comes 
to dealing with the cross calibration of the different detec- 
tors. Figures 013 and El show the results obtained, using the 
quadratic norm 



(33) 



with A = Af or Am and j = CMB, DUST or SZ, to assess 
the residual errors on the estimated emissivities of each com- 
ponent. The plotted curves show how the mean of the above 
positive error measure varies with increasing noise variance. 
For the particular case of CMB, table gives the estimated 
standard deviations of the relative errors 



Aij Aij 



(34) 



on the estimated CMB emissivity in the six channels of 



Planck's HFI in the different configurations retained. 



^31 



Ail 



Aei 



4.4* 
5.4* 
6.6* 
9.4* 
1.2* 



1.6* 
5.3* 
7.0* 
1.0* 
1.4* 



1.7* 
2.1* 
2.7* 
3.3* 



1.8* 
1.9* 
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2.7* 
3.0* 
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4.2* 10" 



5.8* 10 
6.2* 10 
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2.6* 
3.0* 
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1.0*10" 
1.5* 10" 
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2.6* 10" 
2.9* 10" 
3.3* 10" 



2.7*10"^ 
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Tab. 5 - Standard deviations of the relative errors on 
the estimated emissivities An of CMB in Planck's HFI six 
channels. The colunm labels WNM, WM, FNM, FM, FHan 
are for the different configurations, respectiveley : wSMICA 
without mask, wSMICA with mask, fSMICA without mask, 
fSMICA with mask, fSMICA with Hanning apodizing win- 
dow. The five figures in each box are for noise variance -20, 
-6, -3, and 3 dB from nominal Planck values. 



Closer to our source separation objective, a more signifi- 
cant way of assessing the quality of A/ and as estimators 
of the mixing matrix A, would be to use the following signal 
to interference ratio : 



residuals in SZ 



^2 2 
rep _ -^jj'^j 

where the a, are the source variances and 



(35) 



(36) 



with -Rjv the noise covariance. The plots on figures Ifll 1101 and 
ITTlshow how the mean ISR from the 1000 runs of SMICA and 
wSMICA in different configurations, varies with increasing 
noise. 
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Fig. 11 - Comparison of the mean ISR for SZ as a function 
of noise in five different configurations namely : wSMICA 
without mask, wSMICA with mask, fSMICA without mask, 
fSMICA with mask, fSMICA with Hanning apodizing win- 
dow. 



missing. However this is not always the case with SMICA. 
Finally this set of simulations, conducted in a more realistic 
setting with respect to ESA's Planck mission, again confirms 
the higher performance, over Fourier analysis, that we indeed 
expected from the use of wavelets. The latter are able to 
correctly grab the spectral content of partly masked data 
maps and from there allow for better component separation. 



Fig. 9 - Comparison of the mean ISR for CMB as a func- 
tion of noise in five different configurations namely : wS- 
MICA without mask, wSMICA with mask, fSMICA without 
mask, fSMICA with mask, fSMICA with Hanning apodizing 
window. 
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noise level in dB relative to nominal values 

Fig. 10 - Comparison of the mean ISR for DUST as a 
function of noise in five different configurations namely : wS- 
MICA without mask, wSMICA with mask, fSMICA without 
mask, fSMICA with mask, fSMICA with Hanning apodizing 
window. 



We note again that the performance of wSMICA behaves 
as expected when noise increases and if part of the data is 



5. CONCLUSION 

This paper has presented an extension of the Spectral 
Matching ICA algorithm to the case where the collected 
data is both correlated and non stationary, considering maps 
with gaps as a particular instance of practical significance. 
It was shown that simply substituting covariance matching 
in Fourier space by covariance matching in wavelet space 
enables to cope in the most general and straightforward 
way with gaps of possibly any shape. Mainly, it is the FIR 
nature of the wavelet filters used that allows the impact of 
edges and gaps on the estimated covariances and hence on 
component separation to be lowered. Optimally choosing 
the FIR filter-bank regarding a particular application is a 
possible further enhancement. 

Results obtained with simulated astrophysical data as 
expected from the Planck mission were given and these 
confirm the benefits of correctly processing existing gaps. 
Clearly, other possible types of non-stationarities in the 
collected data such as spatially varying noise or component 
variance, etc. can be dealt with very simply in a similar 
fashion using the wavelet extension of SMICA. 

In the CMB application, the mixed components have 
quite different statistical properties : some are expected to 
be very close to Gaussian whereas others are strongly non 
Gaussian. Standard ICA methods exploit the non Gaussian- 
ity of the mixed components. However, it is not clear yet 
how best to combine non Gaussianity and spectral diversity 
in order to perform better source separation. Other features 
of wavelets which are known to be powerful tools for the 
analysis and sparse representation of structured data might 
reveal useful here. 



A. APPENDIX : EM ALGORITHM WITH 
CONSTRAINTS ON THE MIXING MATRIX 

Considering Q separate frequency bands of size with 
^ riq = 1, the EM functional derived for the instanta- 
neous mixing model Q with independent Gaussian station- 
ary sources S and noise A*' is : 



m,d) = £{\ogp{x,s\e)\e] 



(37) 



with 6 = {A,Rs,i, . . . ,Rs,q,Rn,i, ■ ■ ■ ,Rn,q) and 
i. = (ii,Es,i'---'^s,Q>^]v,i>---.Ejv,Q)- The maxi- 
mization step of the EM algorithm seeks then to maximize 
^{0_,9) with respect to 9_ and the optimal 9_ is used as the 
value for 6 at the next EM step, and so on until satisfactory 
convergence is reached. Explicit expressions are easily 
derived for the optimal 6_ in the white noise case where 
an interesting decoupling occurs between the re-estimating 
equations for noi se variances, source variances and the 
mixing matrix 

Linear equality constraints 

When A is subject to linear constraints, the joint 
maximization of the EM functional with respect to all 
model parameters is no longer easily achieved in general. 
In fact, one cannot simply decouple the re-estimating 
rules for the noise parameters and the mixing matrix and 
these have to be optimized separately. We give next the 
modified re-estimating equations for the mixing matrix 
and the source variances in the case of constant noise {i.e. 
e = {A,Rs,u.--,Rs,Q) ). 

First, let us exhibit the quadratic dependence of the EM 
functional ^{9, 8) on A : 



>(^,e) = -i^rz,tr(Ai?r4^i?^' 



where 



- ^fl^'^i?^^, - Rq'A'^RN^g) + const A (38) 



iA^R],^^A + Rs\)-' (39) 



W,= {A^R^^^A + Rl'^r'A^RJ,^^ (40) 

n„ — 



Uq — 



Rx,gW^ (41) 

WqRx.gW^ + C, (42) 



In the white noise case, RN,q ~ Rn, equation 13811 
becomes : 



= -^tr[iA- R'''R'"-')R' 



where 



(A - R'^'R^'-yR]^' + const A (43) 



i?"" = Yl ^iK" and R'" = ngRl" (44) 



Again, this can be re-written as : 

Hi.d) = - '^{A - M)Q{A - M)'' + constA (45) 



where 



A = vectA , Q = Rn_-'^ ® Y "^iK" (46) 



M — vect 



(Y n.R^^ 



(47) 



With "vect", we build a column vector with the entries of 
a matrix taken along its lines. Now let us consider linear 
constraints on the mixing matrix, specified as follows : 



CHA-Ao) 







(48) 



where C is a matrix with as many columns as constraints, and 
the columns of C are the same size as A. The maximum of 
the EM functional with respect to 6 subject to the specified 
linear constraints is then reached for : 



A = M-QC(C^QC) C''{M-Ao 



and 



(49) 



(50) 



B:s,q = diag(-R^'') 

where "diag" returns a matrix with the same diagonal 
entries as its input argument. 

In the free noise case, things are quite similar except 
that the noise covariance matrices RN,q do not factorize out 
as nicely. The EM functional is again expressed as : 

*(^> 9) = -^(A- M)Q{A - My + constA (51) 



where in this case 



Q = Y-' 



and 



M = Q Vect ( Y nqRN^Rq 



(52) 



(53) 



Then, the maximum of the EM functional with respect to 
9 subject to the specified linear constraints is again reached 
for : 

A = M-Qc(c''QCy^C''{M~Ao) (54) 

and 

Es,, = diag(i?r) (55) 

These expressions of the re-estimates of the mixing ma- 
trix can become algorithmically very simple when for in- 
stance the linear constraints to be dealt with affect separate 
lines of A, or even simpler when the constraints are such that 
the entries of A are affected separately. 

Positivity constraints on the entries of A 

Suppose a subset of entries of A are constrained to be 
positive. The maximization step of the EM algorithm on A 
alone, again has to be modified. We suggest dealing with such 
constraints in a combinatorial way rephrasing the problem 
in terms of equality constraints. If the unconstrained max- 
imum of the EM functional is not in the specified domain, 
then one has to look for a maximum on the borders of that 
domain : on a hyperplane, on the intersection of two, or three, 
or more hyperplanes. One important point is that the max- 
imum of the EM functional with respect to A subject to a 
set of equality constraints will necessarily be lower than the 
maximum of the same functional considering any subset of 
these equality constraints. Hence, not all combinations need 
be explored, and a Branch and Bound type algorithm is well 
suited PSl- A straightforward extension allows to deal with 
the case where a set of entries of the mixing matrix are con- 
strained by upper and lower bounds. 
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